Statistical Model Criticism using Kernel Two Sample Tests

نویسندگان

  • James Robert Lloyd
  • Zoubin Ghahramani
چکیده

We propose an exploratory approach to statistical model criticism using maximum mean discrepancy (MMD) two sample tests. Typical approaches to model criticism require a practitioner to select a statistic by which to measure discrepancies between data and a statistical model. MMD two sample tests are instead constructed as an analytic maximisation over a large space of possible statistics and therefore automatically select the statistic which most shows any discrepancy. We demonstrate on synthetic data that the selected statistic, called the witness function, can be used to identify where a statistical model most misrepresents the data it was trained on. We then apply the procedure to real data where the models being assessed are restricted Boltzmann machines, deep belief networks and Gaussian process regression and demonstrate the ways in which these models fail to capture the properties of the data they are trained on.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Topics in kernel hypothesis testing

This thesis investigates some unaddressed problems in kernel nonparametrichypothesis testing. The contributions are grouped around three main themes:Wild Bootstrap for Degenerate Kernel Tests. A wild bootstrap method for non-parametric hypothesis tests based on kernel distribution embeddings is pro-posed. This bootstrap method is used to construct provably consistent teststh...

متن کامل

Kernels Based Tests with Non-asymptotic Bootstrap Approaches for Two-sample Problems

Considering either two independent i.i.d. samples, or two independent samples generated from a heteroscedastic regression model, or two independent Poisson processes, we address the question of testing equality of their respective distributions. We first propose single testing procedures based on a general symmetric kernel. The corresponding critical values are chosen from a wild or permutation...

متن کامل

Bayesian Learning of Kernel Embeddings

Kernel methods are one of the mainstays of machine learning, but the problem of kernel learning remains challenging, with only a few heuristics and very little theory. This is of particular importance in methods based on estimation of kernel mean embeddings of probability measures. For characteristic kernels, which include most commonly used ones, the kernel mean embedding uniquely determines i...

متن کامل

A Kernel Test of Goodness of Fit

We propose a nonparametric statistical test for goodness-of-fit: given a set of samples, the test determines how likely it is that these were generated from a target density function. The measure of goodness-of-fit is a divergence constructed via Stein’s method using functions from a Reproducing Kernel Hilbert Space. Our test statistic is based on an empirical estimate of this divergence, takin...

متن کامل

A Kernel Method for the Two-Sample-Problem

We propose a framework for analyzing and comparing distributions, allowing us to design statistical tests to determine if two samples are drawn from different distributions. Our test statistic is the largest difference in expectations over functions in the unit ball of a reproducing kernel Hilbert space (RKHS). We present two tests based on large deviation bounds for the test statistic, while a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015